A nasty numerical bug that cost me half a year

Today I finally traced down a bug I have been looking to get rid of for roughly one year. It occurred in the context of my thesis, when computing the Laplace approximation (LA) to a non-Gaussian state space model. In my case, the LA turns out to be a Gaussian state space model (SSM), which is very useful from a computational standpoint: for these models efficient algorithm such as the Kalman filter and smoother are available.

The LA produces, for my applications, synthetic observations \(z_{0}, \dots, z_{n}\) with corresponding covariance matrices \(\Omega_0, \dots, \Omega_n\) that can be integrated into the model by

\[ z_{t} = s_{t} + \eta_{t}, \eta_{t} \sim \mathcal N(0, \Omega_{t}) \]

For the model to be valid, the \(\Omega\) matrices have to be positive semi-definite (they are covariance matrices after all) and \(\eta_t = z_t - s_t\) does have to lie in the image of the Cholesky root of \(\Omega\).

However, as you might guess, in some cases \(\Omega\) turned out to be indefinite, with sometimes comparably large negative eigenvalues. This seemed to occur whenever I marked data as missing, so surely there was something wrong in my implementation of this missingness, right? So I went through the code for missing data with great scrutiny - creating figures and tests, debugging my way through many functions. But to no avail - everything seemed to work as expected.

This might be worthy of another post, as marking observations as missing in a state space model can be achieved quite elegantly.

I stepped away from the problem and returned to it occasionally, hoping that a fresh state of mind would help with identifying the problem, but for half a year everything I tried did not work.

Today, I finally found the bug and feel a bit stupid: it is what a freshman taking a numerics course could have told me: when you invert positive definite matrices numerically, there may be rounding errors, resulting in the inverse being indefinite. As it turns out, the missingness I implemented led to very ill-conditioned matrices which, once inverted, produced negative eigenvalues in the should-be positive definite matrices.